Low Latency Index Maintenance in Indri

نویسندگان

  • Trevor Strohman
  • W. Bruce Croft
چکیده

There has been a resurgence of interest in index maintenance (or incremental indexing) in the academic community in the last three years. Most of this work focuses on how to build indexes as quickly as possible, given the need to run queries during the build process. This work is based on a different set of assumptions than previous work. First, we focus on latency instead of throughput. We focus on reducing index latency (the amount of time between when a new document is available to be indexed and when it is available to be queried) and query latency (the amount of time that an incoming query must wait because of index processing). Additionally, we assume that users are unwilling to tune parameters to make the system more efficient. We show how this set of assumptions has driven the development of the Indri index maintenance strategy, and describe the details of our implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OSIR 2006 Second Workshop On

There has been a resurgence of interest in index maintenance (or incremental indexing) in the academic community in the last three years. Most of this work focuses on how to build indexes as quickly as possible, given the need to run queries during the build process. This work is based on a different set of assumptions than previous work. First, we focus on latency instead of throughput. We foc...

متن کامل

DUTIR at TREC 2007 Enterprise Track

This paper describes our experiments on the two tasks of the TREC 2007 Enterprise track. In data preprocessing stage we stripped the non-letter character from documents and query. For the Document Search, we built the index by indri and lemur, handled the query topic and then retrieved relevant documents by indri and lemur. For the Expert Search, we recognized candidates from collection, establ...

متن کامل

Dynamic Collections in Indri

Text search engines have historically been designed for unchanging collections of documents. While this is fine for many applications, a growing number of important applications in news, finance, law and desktop search require indexes that can be efficiently updated. Previous research into supporting dynamic collections revolves around incremental methods. Incremental systems are optimized for ...

متن کامل

Pyndri: A Python Interface to the Indri Search Engine

We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fastpaced IR research.

متن کامل

RMIT and Gunma University at NTCIR-9 GeoTime Task

We participated in the English English and Japanese Japanese subtasks. We selected the Indri search engine as a baseline to test our new class of indexing algorithms. English documents for Indri: Each document was converted to lowercase and written in trec sgml format. We then indexed the collection using Krovetz stemming and stopword removal. English documents for Newt: Each document was conve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006